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1 .   I n  t roduct ion 

The  Folk  Theorem  for  repeated  games  asserts  that  any  feasible, 
individually  rational  payoffs  for  a  one-shot  game  can  arise  as  Nash 
equilibrium  average  payoffs  when  the  game  is  infinitely  repeated. 
In  our  [1986]  paper,  which  extends  this  result  to  subgame  perfect 
equilibrium  and  discounting,  we  assumed  that  the  players  can 
condition  their  play  on  the  realization  of  a  publicly  observed 
random  variable.   We  asserted,  however,  that  abandoning  the 
assumption  would  lead  to  only  a  slight  weakening  of  the  results; 
viz.,  any  feasible,  individually  rational  payoffs  can  be 
approximated  by  a  perfect  equilibrium  where  there  is  sufficiently 
little  discounting.   This  note  shows  that,  in  fact,  our  extension  of 
the  Folk  Theorem  holds  in  a  strong  sense  even  without  public 
randomization:  all  feasible  individually  rational  payoffs  can  be 
exactly  attained  in  equilibrium. 

Although  this  stronger  result  is  of  some  interest  by  itself,  its 
true  significance  appears  in  connection  with  mixed  strategies. 
Early  analyses  of  repeated  games  with  little  or  no  discounting 
(Aumann-Shapley  [1976],  Friedman  [1971]  and  Rubinstein  [1979]) 
restricted  players  to  pure  strategies,  or  equivalent ly ,  assumed  that 
a  player's  choice  of  a  mixed  strategy  in  any  period  is  observable  by 
his  fellow  players.   The  assumption  of  pure  strategies  is  restric- 
tive because  typically  the  range  of  individually  rational  payoffs  is 
greater  when  players  are  allowed  to  use  mixed  strategies  to  punish 
their  opponents.   The  alternative  hypothes is--that  a  player's 
randomizations  are  ex  post  observable — is,  likewise,  strong. 

Section  6  of  our  [1986]  paper  showed  how  to  extend  the  Folk 
Theorem  to  allow  for  mixed  strategies  when  only  a  player's  realized 


actions,  and  not  his  choices  of  randomizing  probabilities,  are 
observable.   The  key  was  the  observation  that  a  player  can  be 
induced  to  use  a  mixed  strategy  to  ininimax  an  opponent  by  making  her 
continuation  payoff  depend  on  her  current  action  in  a  way  that 
renders  her  exactly  indifferent  among  the  various  choices  in  the 
mixed  strategy's  support. 

Our  argument  relied  on  public  randomization  to  ensure  that  any 
individually  rational  continuation  payoffs  can  be  exactly  attained. 
If,  without  public  randomization,  the  continuation  payoffs  could 
merely  be  approximated,  a  minimaxing  player  might  not  be  exactly 
indifferent  over  the  support  of  his  mixed  strategy,  and  our 
construction  would  fail.   Thus,  if  we  obtain  only  an  approximate 
version  of  the  Folk  Theorem  without  public  randomization,  our 
construction  cannot  accommodate  unobservable  mixed  strategies. 

Attaining  payoffs  exactly  is  also  essential  for  the  argument  in 
our  [1987a]  paper,  which  provided  sufficient  conditions  for  the  sets 
of  Nash  and  perfect  equilibrium  payoffs  to  coincide  for  discount 
factors  less  than  one.   Although  the  body  of  that  paper  assumed  the 
possibility  of  public  randomization,  our  results  here  imply  that 
this  assumption,  as  in  the  Folk  Theorem  paper,  is  unnecessary. 


2.   The  Model 

We  consider  a  finite  n-player  game  in  normal  form: 

g:  A-Rn, 

where  A=A,x...xA   and  A.  is  player  i's  action  space.   Let  Z.  be  the 
1       n       1  i 

set  of  player  i's  mixed  strategies,  i.e.,  the  probability 


distributions  over  A. ,  and  set  Z=Z, *. . . *Z  .   To  simplify 

l  In 

notation,  we  will  write  g ,  (  o  )  for  player  i's  payoff  given  the  mixed 

strategy  vector  o€£. 

In  repeated  versions  of  g,  each  player's  probability  mixture 

over  actions  at  time  t  can  depend  on  the  actions  chosen  at  all 

previous  times.   More  formally,  let  h(t)  e  A     -    H(t)   be  the 

realized  actions  from  time  zero  through  time  t-1.   Player  i's 

strategy  is  a  sequence  of  maps  (one  for  each  period)  from  H(t)  to 

Z.  .   Note  that,  at  any  time  t,  player  i's  strategy  does  not  depend 

on  the  past  randomizing  probabilities  of  his  opponents,  but  only  on 

their  realized  actions. 

In  the  infinitely  repeated  game  G  ,  each  player  i's  payoff  is 

o 

the  average  discounted  sum  -n-^  of  his  per-period  payoffs,  with  common 
discount  factor  6: 

T,.=  (1-5)  Z  £t_1g.(a(t)), 
1         t  =  l       X 

where  a(t)  is  the  probability  distribution  of  actions  chosen  in 
period  t . 

For  each  player  j,  choose  "minimax  strategies"  m  =(m,,...,m  )  so 
that 


and 


m  .  e   arg  min  max  g.(o.,o  .), 

-J  J   J   "J 

a  a  . 

-J    J 


v.  =  max  g.(a.,m  .)  =  g.(m  ). 
J    „     J   J   "J      J 

J 


(Here  "m^."  is  a  mixed  strategy  selection  for  players  other  than  j, 
and  g  .  (a  .  ,mJ  .  )  =g  .  (m^  ,  .  .  .  ,  m^  .  ,  a  .  ,  mJ.^1 mf  )  )  .   We  call  v.  player 


'  j   j  '  -  j    j   1 


j-1'  j'  j+1' 


j's  reservation  value.   Clearly,  player  j's  average  payoff  must  be 
at  least  v  .  in  any  equilibrium  of  g,  whether  or  not  g  is  repeated. 

Henceforth  we  shall  normalize  the  payoffs  of  the  game  g  so  that 

t  t 

( v ...... v  )  =  (  0  ,  .  .  .  .  0  )  .   Call  (0.....0)  the  mi  n  imax  point.   Take 

1       n 

v . =max  g.(a).   Moreover,  let 
l       l 
a 

U  =  {(v.,...,v  )  |  there  exists  a€Z  with  g( a) =( v .,..., v  )}, 


and 


V  =  Convex  Hull  of  U, 


V   =  {(v.,...,v  )€V  |  v.>0  for  all  i} 

1       n     '   l 


3 .   The  Folk  Theorem  without  Public  Randomization 

Our  [1986]  paper  showed  that  if  public  randomization  is  allowed 

and  either  n=2  or  the  dimension  of  V   equals  n,  then  for  any  payoff 

vector  veV  ,  there  exists  a  discount  factor  S_<1    such  that,  for  all 

Se(_S,l),  there  is  a  perfect  equilibrium  of  G   with  payoffs  v.   We 

6 

now  demonstrate  that  public  randomization  is  inessential  for  this 
resul t . 

Lemma  1  establishes  that,  for  5  sufficiently  large,  all  points 
in  V  are  feasible  and  can  be  obtained  without  using  mixed 
strategies.   That  is,  for  any  veV   there  is  a  deterministic  sequence 
of  actions  {a(t)}"_.  for  which  v  is  the  payoff  vector.   This  is  not 
sufficient  to  establish  the  Folk  Theorem,  however,  because,  even  if 


1.  Of  course,  for  low  discount  factors,  public  randomization 
does  make  a  difference.   If  &    is  near  zero,  the  payoff  vector  for 
the  sequence  {a(t)}  is  approximately  g(a(l)),  and   so,  quite  apart 
from  equilibrium  considerations,  many  payoffs  in  V   are  not 
f eas  ib le . 


veV  ,  the  sequence  { a ( t ) }  might  have  the  property  that,  for  some 
period  T  ,  the  continuation  payoffs  beginning  at  t  do  not  belong  to 
V  .   In  that  case,  some  player  would  prefer  to  deviate  from  the 
sequence,  even  if  so  doing  caused  his  opponents  to  minimax  him 
thereafter . 

Building  on  Lemma  1,  Lemma  2  shows  that  payoffs  in  V   can  be 
generated  by  a  deterministic  sequence  in  such  a  way  that  the 
continuation  payoffs  always  lie  in  V  .   Following  Lemma  2,  we 
explain  how  our  results  allow  us  to  do  without  public  randomization 
in  the  proof  of  the  Folk  Theorem. 

Write  A={a  ,...,a  }  and,  for  each  j,  let  w  =  g(  a  )  .   Thus, 
{w  ,...,w  }  is  the  set  of  payoff  vectors  corresponding  to  pure 
strategies . 

Lemma  1 :   If  <S>  1 — ,  then  for  any  v€V  there  is  a  sequence  {a(t)} 

of  pure  strategies  whose  average  payoff  is  v. 

m  . 
Proof:   Let  v=IxJwJ,  where  0<xJ<l,  and  £  xJ  =  l.   We  construct 

j  =  l 

(a(t)}  as  follows.   Let  I  (t)  be  an  index  variable,  which 

is  1  if  a(t)=aJ  and  0  otherwise.   Set  NJ(1)=0  for  all  j,  and 


t-1 
let  NJ(t)  =   I  (1-S)ST  aIj(t)  for  t>l.   NJ(t)  is  the  "average 

T=l 

discounted  weight"  given  to  strategy  vector  a   before  time  t.   Let 


J  vJ 


t-1 


C(t)  =  (j|xJ-NJ(t)  >  5    (1-6)}.       Now  define 
j*(t)  =  arg  max   {xJ-  Nj(t)},2 

jec(t) 


ft) 

and  set  a(t)=aJ     .   This  defines  an  algorithm  for  computing  a(t) 


2.  If  there  is  a  tie,  make  a  deterministic  selection. 


Claim  1  :   The  algorithm  is  well-defined,  i.e.,  the  set  C(t)  is 
never  empty. 


To  prove  the  claim,  assume  to  the  contrary  that  at  some  time(s)  t, 

s-1         j 
C(t)  is  empty,  and  let  s  be  the  first  such  time.   Then  5    (1-6)>X  ■ 

N  (s)  for  all  j.   Summing  over  j,  we  have 

_,     m.  ms— 1        _,. 

(1)    m(l-5)SS  lll-    Z  NJ(s)  =  1-1    Z  (1-S)5T   IJ(t) 


j  =  l 


j=l   T=l 


s-1  _,        _. 

=  1-  Z  (1-«S)5T  l    =  SS    L 


But  (1)  contradicts  our  assumption  that  m(l-S)<l,  establishing  the 
claim . 

Let  NJ(oo)  =       lim  NJ(t).   (Because  NJ(t)  is  increasing  and 

t  ~*00 


bounded,  this  limit  exists.) 


Claim  2:   For  all  j,  NJ(<»)=\V 


m 


To  establish  Claim  2,  note  first  that,  by  construction,   Z  N  (<»)  =  1. 

j  =  l 
Moreover,  N  («)  cannot  exceed  x  ,  because  N   increases 

(by  St-1(l-S))  only  when  NJ ( t )< xJ- St' l ( 1- 5) .   Thus  Nj(„)ixJ 

m   . 
for  each  j,  and,  since   Z  x  =1,  N  («)=x  ,  proving  the  claim. 

j  =  l 

Now,  by  construction,  the  payoffs  corresponding  to  {a(t)}  are 


(1-S)  I  6l    Xg(a(t))  = 
t  =  l 

(1-5)  Z  St_1[  Z  IJ(t)wJ]=  Z  wJ(l-5)  Z  6t_1IJ(t)-ZwjNj(co)=ZxjwJ  =  v 


t  =  l 


j  =  l 


j  =  l 


t  =  l 


Q.E.D. 


Roughly  speaking,  the  algorithm  of  Lemma  1  works  as  follows.   By- 
definition,  v  is  a  convex  combination  Zx  w   of  the  pure  strategy 
payoff  vectors  w  , . . . ,w  .   To  generate  v  as  a  discounted  average 
payoff  over  time,  choose  that  pure  strategy  vector  a   at  time  t  for 
which  the  difference  between  x   and  the  fraction  of  times  a   has 
been  used  up  until  t  (suitably  weighted  for  discounting)  is  largest. 

The  continuation  payoffs  at  time  s  associated  with  the  sequence 

t  —  S 

(a(t)}  are  simply   Z  5    g(a(t)),  i.e.  the  discounted  sum  of 

t  =  s 

per-period  payoffs  starting  at  time  t  and  discounted  to  time  s. 


Lemma  2:   For  every  £>0  and  closed  set  VcV   with 
min  v.>5£,  there  exists  _S<  1  such  that,  for  all  £€(_£,  1) 

i.vev 

and  every  v£V,  there  is  a  sequence  {a(t)}  of  pure  strategies  whose 
discounted  average  payoffs  are  v,  and  whose  continuation  payoffs  at 
any  time  t  are  at  least  €  for  each  player. 


Remark :   To  prove  that  public  randomization  is  inessential  for 
the  Folk  Theorem  with  observable  mixed  strategies,  it  would  suffice 
to  show  that,  for  any  individually  rational  payoff  v,  there  exists  _£ 
such  that,  if  8    exceeds  _5,  v  can  be  generated  by  a  deterministic 


8 

sequence  whose  continuation  payoffs  are  individually  rational. 
Lemma  2  establishes  a  stronger  property;  it  asserts  that  a  fixed  _S 
works  uniformly  for  the  entire  set  V.   We  provide  the  stronger 
result  because  it  is  needed  to  show  that  public  randomization  is 
inessential  with  unobservable  mixed  strategies  and  in  our  [1987] 
paper . 

Proof:   Let  Z  be  the  polygon  corresponding  to  the  intersection 

of  the  set  V  with  the  inequality  constraints  v.23£,  and  let  {z  }  be 

the  J  vertices  of  Z.   Clearly,  V cZ .   Let  Z  be  a  polygon  with 

vertices  {z  }  such  that  (i)  each  z   is  within  £  of  z  ;  (ii)  z  =z   if 

z  e{w  ,...,w  };  and  (iii)  z   can  be  expressed  as  a  weighted 

k     k  k 

average  Zx  (j)w  ,  where  each  weight  x  (j)  is  a  rational  number. 


Observe  that  VcZ 


k,  . 


Because  the  x  (j)'s  are  rational,  we  can  find  integers 
(r  (j))u_i  and  d  such  that  for  all  j  and  k,  x  (j)=r  (j)/d.   Let 

"cycle  j"  be  the  d-period  sequence  of  pure  strategies  in  which  a   is 

1  2  2 

played  for  the  first  r  (j)  periods;  a"  is  played  for  the  next  r  (j) 

periods;  and  so  on  for  all  k  between  3  and  m.   (Recall  that 

k     k  j 

w  =g(a  ))•   Let  z  ( S)    be  the  discounted  average  payoffs 

corresponding  to  cycle  j.   Note  that  if  z   is  on  the  boundary  of  V, 

then  z  (5)  will  be  on  the  boundary  as  well.   If  we  set 

Rk(j)  =   I  rS(j),  with  R°(j)  =  0, 
s  =  l 

then 


.  m    Rk(j)-1  ,        . 

zJ(S)       Z         Z     (1-5)  5SwK/(l-Sa)  . 

k=l     Dk-1,  .v 

s=R    (j) 

Choose  S_   so  that,  for  all  5  greater  than  _5  and  all  j,  z  (  5)  is 

within  E  of  z  .   By  construction,  for  all  5>.S,  V  is  contained  in  the 

polygon  Z(S)  whose  vertices  are  the  z  ( 5) ' s . 

We  now  apply  the  algorithm  of  Lemma  1  to  generate  each  v€V  by  a 

deterministic  sequence  of  the  z  (  5)  '  s  for  5>.S,  where  5=max  (.5,  1-1/J)  . 

Earlier,  when  the  payoffs  w   were  called  for  in  a  given  period  t,  we 

set  a(t)=a  .   In  our  current  application,  we  replace  the  w   s  with 

the  z  ( S) ' s .   Moreover,  when  the  algorithm  calls  for  payoffs  z  ( 5) , 

we  assign  cycle  j_   as  the  actions  for  the  next  d  periods.   The  Lemma 

1  algorithm  so  modified  guarantees  that  we  can  generate  each  of  the 

payoffs  in  V  by  a  deterministic  sequence  of  these  cycles-   Because 

each  cycle  is  of  length  d  and  each  z  ( 5)  gives  each  player  a  payoff 

of  at  least  2£ ,  the  continuation  payoffs  starting  at  any  time  t  give 

each  player  at  least  £  if  S  is  taken  large  enough  to 

satisfy  (1-5  )g_+5  (2£)>£,  where  £=min  g.(a)  is  the  lowest  possible 

i,  a 
value  of  any  player's  payoff. 

Q.E.D. 

To  summarize,  the  algorithm  of  Lemma  1  shows  how  to  attain  any 

veV  by  a  deterministic  sequence  of  w   s.   Lemma  2  replaces  each  w 

that  is  not  individually  rational  with  a  payoff  vector  z  ( 5)  that 

itself  can  be  attained  through  a  finite  cycle  of  w  's.   Hence,  to 

obtain  v€V   through  a  deterministic  sequence,  (i)  apply  the  Lemma  1 


in 


algorithm  using  the  z  ( 5) ' s  instead  of  the  w  's;  and  (ii)  whenever 
the  algorithm  calls  for  z  (6),  replace  it  with  the  corresponding 
d-period  cycle 

To  see  how  Lemma  2  enables  us  to  do  without  public  randomization 
in  the  proof  of  the  Folk  Theorem,  we  first  recall  the  form  of  the 
strategies  in  our  [1986]  paper.   To  obtain  the  point  v£V  ,  we  had 
players  use  publicly  correlated  action  (generating  v  in  each  period) 
as  long  as  no  player  deviates.   If  player  i  deviates,  we  provided  a 
"punishment  equilibrium"  in  which,  (a)  for  a  certain  number  of 
periods,  the  player's  opponents  minimax  him  and  he  responds 
optimally  and  then  (b)  the  players  revert  to  a  more  "cooperative" 
mode  in  which  their  payoffs  are  v  ,  where  this  vector   is  chosen  so 
as  to  induce  i's  punishers  to  go  through  with  their  minimaxing  and 
so  player  i's  overall  payoff  is  less  than  £.   Like  v,  it  is 
generated  by  publicly  correlated  actions.   From  Lemma  2,  we  can 
replace  the  publicly  correlated  actions  yielding  v  and  v   with 
deterministic  sequences  whose  continuation  payoffs  are  greater  than 
£.   Because  deviation  leads  to  payoffs  less  than  e,  no  player  will 
wish  to  deviate  from  such  a  sequence.   In  the  case  where  mixed 
strategies  are  observable,  this  is  the  only  change  to  the  proof 
required  to  eliminate  public  randomization. 

The  case  where  only  players'  realized  actions  (and  not  the 
randomizations  themselves)  are  observable  presents  an  additional 
complication.   If,  to  minimax  player  i,  player  j  uses  a  mixed 
strategy,  he  must  must  be  indifferent  among  the  various  actions  over 


11 


which  he  randomizes.   Our  [1986]  proof  ensured  this  indifference  by 
making  j's  continuation  payoff  after  the  punishment  phase  contingent 
on  his  actions  during  the  phase.   It  is  important  here  that 
precisely  specified  values  for  the  continuation  payoffs  be 

attainable;  it  would  not  suffice  merely  to  approximate  them.   Lemma 

3 
2  shows,  however,  these  exact  values  can,  in  fact,  be  attained. 

4 
Thus  public  randomization  is  inessential  in  this  case  too. 
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